

#### **COMP 273**

Virtual Memory

Part 1



Prof. Joseph Vybihal



#### Announcements

- Course evaluation
- Exam





#### Part 1

Virtual Memory Basics





# COMP 273



#### High addresses Per-process data IO registers (h/w dependent) Only accessible to OS routines Shared by all tasks Kernel data Kernel code Stack (grows down) Accessible to Heap (grows up) user programs Declared data Program code Low addresses

## Standard MIPS Memory

FIGURE 6.1 Memory map for a protected process

u vydinai (C) ZULS



#### Motivation for VM

• A simulator for memory giving your computer the impression that it has more RAM.

• Removes the <u>burden</u> from a programmer in managing limited RAM.

• VM helps to <u>allow</u> multiprocessing by simulating more space in RAM





### Memory Management Types

- None
- None with cheating by programmer
  - Terminate Stay Resident (TSR)
- Compiler Managed
  - Overlaying
- OS Managed
  - Page swapping & Virtual Memory





Management By: No one

Memory Type : Full Ownership

OS

Video

Free Space

Zero

#### Classical Software Development:

- Program compiled with no special features
- Linker adds a loader to the executable
- Loader inserts code into free space at a given start address
- OS notified of its existence
- Program executes to completion then terminates
- OS notified of termination and removed from RAM







Management By: Programmer (cheating?)

Memory Type : TSR (Terminate and Stay Resident)



#### Software Development:

- Compiled with OS notification turned off
- Linker adds a loader to the executable
- Loader inserts code into free space at a given start address
- OS NOT notified of its existence
- Program executes
  - Modifies the OS interrupt vector (point to itself)
  - Then it terminates
- OS NOT notified of termination
  - Program NOT removed from RAM
- A subsequent program loaded into RAM can use TSR
  - Uses Interrupt Vector to switch to TSR and back



#### Management By: Compiler

Memory Type : Overlay



#### **Software Development:**

- Compiled with overlay mode ON
- Program compiled into fixed sized OVERLAYS
- Each overlay can be loaded and run independently in RAM
- OS notified of <u>complete</u> program's existence
- Program executes to completion then terminates
  - Master and slave frames
- OS notified of termination and removed from RAM

Fixed Size Overlay 1 Overlay 2 Overlay 3

Main() and swapper lib-function

Swap in/out from RAM



Permits very large programs to run in smaller RAM



Management By: OS

Memory Type : Virtual Memory

Virtual Address Space





Permits very large programs to run in smaller RAM

Vybihal (c) 2015



## Virtual Memory Method

- 1.Launch a new program
  - 1. Convert it into a process
  - 2. Convert the code into pages
  - 3. All pages "virtually" loaded into VM
  - 4.A <u>subset</u> of pages actually loaded into RAM (Frames)
  - 5.Memory map between VM and RAM
- 2.Execute program
  - 1.Instructions execute until end of page
  - 2. Search for next page in RAM
  - 3.If not found → "Page Fault" → do overlay
  - 4. Continue executing program from new loaded page





#### VM Properties

- Programmer addresses code based on VM address
- OS, therefore, constructed to manages code from VM space (from that point of view)
- BUT, code must actually execute on real hardware = RAM and CPU
- THEREFORE, need to convert all addresses to real RAM values
- This must be a fast process
- This must take into account the page / frame duality of this technique





#### Part 2

Virtual Memory Specifics





## Mapping from VM to RAM



Therefore:  $RAM = 2^{18} * 2^{12} = 1 GB$  (frames)

 $VM = 2^{20} * 2^{12} = 4 GB$  (pages)

McGill

Vybihal (c) 2015

14



#### Remember

| Storage       | Technology                    | Speed                            | Cost          |
|---------------|-------------------------------|----------------------------------|---------------|
| CPU Registers | Flip-flops                    | 1-5 ns                           | \$250 - \$300 |
| Cache         | SRAM                          | 5 – 25 ns                        | \$100 - \$250 |
| RAM           | DRAM                          | 60 – 120 ns                      | \$5 - \$10    |
| Disk          | Magnetic charge  – mechanical | 10 Million ns –<br>20 million ns | \$0.1 - \$0.2 |



~ per 1 Meg





#### Huge Cost of Page Faults

- Page faults imply load overlay from disk!
- How big should a page be?
  - Amortization of disk access time
  - Common page sizes: 4, 16, 32, 64 KB
- OS driven, therefore algorithmic selection:
  - Page loading order
  - Disk drive considerations





### Page Loading Order

- On demand
  - When we need a page get the overlay from disk
  - What if all the frames are filled? Which one do we overlay? (the victim)
- Is there a best overlay selection procedure?
  - Least Recently Used frame
  - First Come First Serve frame replacement
  - Replace all frames of another process
  - Randomly select a frame and overlay





#### Disk Drive Considerations

- Dirty Pages
  - A page in RAM whose data has changed has been selected to be overlaid
  - Need to write that page back to disk (write-back)
  - Read in new page becomes also write out old page
- Byte or Block Disk Access
  - Which is faster?
  - Merging block with buffer & frame improves speed
    - Seek time for next byte is skipped
    - For block we increment pointer after first seek
    - Using DMA also good
    - Using Interrupts also good





#### Two Types of VM

- Paging
  - Fixed size overlays
  - Overlay matches frame size
  - Addressing: Page # Offset
- Segmentation
  - Variable sized overlays
  - Overlays are multiples of frame size
    - This is true for simplicity
    - Can also be implemented with true variableness
  - Addressing: Segment # Page # Offset





## Paging Hardware







#### The Page Table







McGill

Vybihal (c) 2015

2



#### Faster Address Translation





### Implementation

Virtual address





#### Flowchart





McGill

Vybihal (c) 2015



#### **Statistics**





25



#### Part 3

#### The SPIM MMU

(Memory Management Unit)

(Optional Material for student to read – not covered in class)





### Memory Translation System

Process no. Virtual page no. **ASID VPN** Offset Page table (RAM) TLB **VPN ASID PFN** Flags **PFN** Flags Mask Offset **PFN** McGill Vybihal (c) 2015





#### Co-Processor 0 MMU Registers

|        | EntryH              | i register (TLB key fields | ) R300 | 0-style ( | CPUs |   |   |
|--------|---------------------|----------------------------|--------|-----------|------|---|---|
|        | 31                  |                            | 1211   |           | 6 5  | 5 | 0 |
|        | *                   | VPN                        |        | ASID      |      | 0 |   |
|        | TIP I               | (:- - - -                  |        |           |      |   |   |
| EntryH | 1 register (ILB key | fields) R4000-style CPUs   |        |           |      |   |   |
| 12 12  | 61                  |                            | 13 12  | 8         | 7    |   |   |
| 63 62  | 01                  |                            | 1012   |           |      |   | 0 |

#### Notes:

VPN, virtual page number ASID, address space id R, address region PFN, VPN high order bits N, non-cacheable C, cache algorithm D, dirty bit write enable V, valid boolean G, global address shared 0, zeros

EntryLo register (TLB data fields) R3000-style CPUs

12 11 10 9 8 7 0

PFN N D V G 0

PageMask register 64-bit CPUs only

| 31 |   | 25         | 24   | 13 | 12 | 0 |
|----|---|------------|------|----|----|---|
|    | 0 | or grades. | Mask |    | 0  |   |

McGill

28

**COMP 273** 

TABLE 6.1 CPU control registers for memory management

| Register<br>mnemonic | CP0<br>register<br>number | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|----------------------|---------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| EntryHi<br>EntryLo/  | 10<br>2 2                 | Together these registers hold everything needed for a TLB entry. All reads and writes to the TLB must be staged through them. <b>EntryHi</b> holds the VPN and ASID; <b>EntryLo</b> holds the PFN and flags.                                                                                                                                                                                                                                                                                                                                                                                  |
| EntryLo0<br>EntryLo1 | 3                         | The field <b>EntryHi</b> ( <b>ASID</b> ) does double duty, since it remembers the currently active ASID.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
| PageMask             | 5<br>ohds-0000            | In some CPUs (all 64-bit CPUs to date) each entry maps two consecutive VPNs to different physical pages, specified independently by two registers called <b>EntryLo0</b> and <b>EntryLo1</b> .                                                                                                                                                                                                                                                                                                                                                                                                |
|                      | YIUI                      | <b>EntryHi</b> grows to 64 bits in 64-bit CPUs but in such a way as to preserve the illusion of a 32-bit layout for software that doesn't need long addresses.                                                                                                                                                                                                                                                                                                                                                                                                                                |
|                      |                           | <b>PageMask</b> can be used to create entries that map pages bigger than 4KB; see Section 6.3.1.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
| Index                | 0                         | This determines which TLB entry will be read/written by appropriate instructions.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
| Random               | 1                         | This pseudo-random value (actually a free-running counter) is used by a tlbwr to write a new TLB entry into a randomly selected location. Saves time when processing TLB refill traps, for software that likes the idea of random replacement (there is probably no viable alternative).                                                                                                                                                                                                                                                                                                      |
| Context              | al <b>4</b> is red        | These are convenience registers, provided to speed up the processing of                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| Xcontext             | 20<br>rog lfow i          | TLB refill traps. The high-order bits are read/write; the low-order bits are taken from the VPN of the address that couldn't be translated.  The register fields are laid out so that, if you use the favored arrangement of memory-held copies of memory translation records, then following a TLB refill trap Context will contain a pointer to the page table record needed to map the offending address. See Section 6.3.5.                                                                                                                                                               |
|                      |                           | **Roontext* does the same job for traps from processes using more than 32-bits of effective address space; a straightforward extension of the Context layout to larger spaces would be unworkable because of the size of the resulting data structures. Some 64-bit CPU software is happy with 32-bit virtual address spaces, but for when that's not enough 64-bit CPUs are equipped with "mode bits" SR(UX), SR(KX) which can be set to cause an alternative TLB refill handler to be invoked; in turn that handler can use **Xcontext* to support a huge but manageable page table format. |





|                | 31   | I CPUs<br>30                                 | 1413 8                                        | 7                         |                                | 0                                  | Lu    | geMask bit |                |       | Page size  |
|----------------|------|----------------------------------------------|-----------------------------------------------|---------------------------|--------------------------------|------------------------------------|-------|------------|----------------|-------|------------|
| and the second | Р    | escally X                                    | Index                                         |                           | X                              |                                    | , and | 24–21      | 20–17          | 16–13 |            |
| А              | l M  | IPS III and highe                            | er CPUs to date                               |                           |                                |                                    |       | 0000       | 0000           | 0000  | 4KB        |
| :10            | 31   | 사용하는 기업을 가라면 그렇게 되었다고 있다면 하다 다른              | JAN 18 19 19 19 19 19 19 19 19 19 19 19 19 19 | 6 5                       |                                | 0                                  |       | 0000       | 0000           | 0011  | 16KB       |
|                | P    |                                              | X                                             |                           | Index                          | <del>ms g</del> osts the           |       | 0000       | 0000           | 1111  | 64KB       |
| RE 6.5         | F    | ields in the <b>Inde</b>                     | x register                                    |                           | i e kanar<br>I e i i i i i i i | kgi ast bacca<br>a, mit dana       |       | 0000       | 0011           | 1111  | 256KB      |
|                |      |                                              |                                               |                           |                                |                                    |       | 0000       | orlonaria da A |       | 43.60      |
| - 1174         |      | <i>subskitte</i> best ()<br>Verst 1 Past sed |                                               |                           | veralization.                  | to a vanish                        |       | 0000       | 1111           | 1111  | 1MB        |
| 32             |      | t CPUs to date                               | 14 13                                         | 8.7                       | recalcule<br>The 2             | Ar e productivi<br>Alla shall have |       | 0000       | 1111<br>1111   | 1111  | 1MB<br>4MB |
|                |      | t CPUs to date                               | 14 13<br>Ran                                  | 8 7<br>dom                | X                              |                                    |       |            |                | 1900  |            |
| 31             |      | X                                            |                                               | the state of the state of | X                              |                                    |       | 0011       | 1111           | 1111  | 4MB        |
| 31<br>         | 4-bi |                                              | Ran                                           | the state of the state of | X 65                           | 0                                  |       | 0011       | 1111           | 1111  | 4MB        |

FIGURE 6.6 Fields in the Random register

#### Notes:

- P, valid index found bool
- X, address value
- Index, TLB position
- Random, random index (auto)
- PTEBase, start page table ptr

• 0, zeros

McGill

Context register for R4x00 and subsequent CPUs

31

63 23 22 **PTEBase** Bad VPN2

21 20

**Bad VPN** 

Context register for R3x00 CPUs

**PTEBase** 

• Bad VPN, address exception ptr XContext register for R4x00 and subsequent CPUs only

| 63   | cipos presid | 33 | 32 | 31 30 |          | 4 3      |   | 0     |
|------|--------------|----|----|-------|----------|----------|---|-------|
| F .5 | PTEBase      |    | R  |       | Bad VPN2 | Light of | 0 | c. 34 |

FIGURE 6.7 Fields in the Context/XContext registers

Vybihal (c) 2015

2 1



#### MMU Control Instructions

- tlbr # read TLB entry into index
- tlbwi # write TLB entry from index
- tlbwr # write TLB entry selected randomly
- tlbp # TLB lookup (uses VPN & ASID)





#### Code Example

```
#include <mips/r3kc0.h>
LEAF (mips_init_tlb)
                                  # save ASID
   mfc0 t0,C0 ENTRYHI
                                  # tlblo = !valid
   mtc0 zero, C0_ENTRYLO
   1i a1,NTLBID<<TLBIDX_SHIFT # index</pre>
                                  # tlbhi = impossible VPN
   1i
         a0,KSEG1_BASE
    .set noreorder
         a1,1<<TLBIDX_SHIFT
   subu
   mtc0 a0,C0_ENTRYHI
   mtc0 a1,C0_INDEX
                                  # increment VPN, so all entries differ
   addu a0,0x1000
   bnez
          a1,1b
                                  # in branch delay slot
   tlbwi
           reorder
    .set
                                 # restore ASID
   mtc0
          t0,C0 ENTRYHI
           ra
END(mips init_tlb)
```



#### **TLB** Initialization



## TLB Exception Code

.set noreorder

.set noat

TLBmissR3K:

mfc0 k1,C0\_CONTEXT

mfc0 k0,C0\_EPC

lw k1,0(k1)

nop

mtc0 k1,C0\_ENTRYLO

nop

tlbwr

jr k0

rfe

.set at

.set reorder

(1) Get address of page table

# (2) Get return address

# (3) Get contents pointed to

# (4) Wait, load takes 2 clock ticks

# (5) LO = k1, Hi auto loaded

# (6) Wait again...

# (7) Write randomly to TLB

# (8) Return to user program

**# (9)** Delay slot execution... restore CPU state in SR

